AITopics | photorealistic image

Collaborating Authors

photorealistic image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

Neural Information Processing SystemsApr-29-2026, 13:58:00 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation Marco Bellagente 4 Manuel Brack 2, 3 Hannah Teufel 1 Felix Friedrich

Neural Information Processing SystemsFeb-16-2026, 19:26:57 GMT

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users.

artificial intelligence, machine learning, usion, (16 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

An indicator for effectiveness of text-to-image guardrails utilizing the Single-Turn Crescendo Attack (STCA)

Kwartler, Ted, Bagan, Nataliia, Banny, Ivan, Aqrawi, Alan, Abbasi, Arian

arXiv.org Artificial IntelligenceNov-27-2024

The Single-Turn Crescendo Attack (STCA), first introduced in Aqrawi and Abbasi [2024], is an innovative method designed to bypass the ethical safeguards of text-to-text AI models, compelling them to generate harmful content. This technique leverages a strategic escalation of context within a single prompt, combined with trust-building mechanisms, to subtly deceive the model into producing unintended outputs. Extending the application of STCA to text-to-image models, we demonstrate its efficacy by compromising the guardrails of a widely-used model, DALL-E 3, achieving outputs comparable to outputs from the uncensored model Flux Schnell, which served as a baseline control. This study provides a framework for researchers to rigorously evaluate the robustness of guardrails in text-to-image models and benchmark their resilience against adversarial attacks.

guardrail, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.18699

Country: Europe > Netherlands > South Holland > Leiden (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > Experimental Study (0.34)

Industry:

Health & Medicine (0.95)
Information Technology > Security & Privacy (0.88)
Law (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.53)

Add feedback

Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models

Conwell, Colin, Tawiah-Quashie, Rupert, Ullman, Tomer

arXiv.org Artificial IntelligenceNov-25-2024

Despite remarkable progress in multi-modal AI research, there is a salient domain in which modern AI continues to lag considerably behind even human children: the reliable deployment of logical operators. Here, we examine three forms of logical operators: relations, negations, and discrete numbers. We asked human respondents (N=178 in total) to evaluate images generated by a state-of-the-art image-generating AI (DALL-E 3) prompted with these `logical probes', and find that none reliably produce human agreement scores greater than 50\%. The negation probes and numbers (beyond 3) fail most frequently. In a 4th experiment, we assess a `grounded diffusion' pipeline that leverages targeted prompt engineering and structured intermediate representations for greater compositional control, but find its performance is judged even worse than that of DALL-E 3 across prompts. To provide further clarity on potential sources of success and failure in these text-to-image systems, we supplement our 4 core experiments with multiple auxiliary analyses and schematic diagrams, directly quantifying, for example, the relationship between the N-gram frequency of relational prompts and the average match to generated images; the success rates for 3 different prompt modification strategies in the rendering of negation prompts; and the scalar variability / ratio dependence (`approximate numeracy') of prompts involving integers. We conclude by discussing the limitations inherent to `grounded' multimodal learning systems whose grounding relies heavily on vector-based semantics (e.g. DALL-E 3), or under-specified syntactical constraints (e.g. `grounded diffusion'), and propose minimal modifications (inspired by development, based in imagery) that could help to bridge the lingering compositional gap between scale and structure. All data and code is available at https://github.com/ColinConwell/T2I-Probology

experiment, participant, relation, (13 more...)

arXiv.org Artificial Intelligence

2411.17066

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training

Xin, Chen, Hartel, Andreas, Kasneci, Enkelejda

arXiv.org Artificial IntelligenceJul-12-2024

Swift and accurate detection of specified objects is crucial for many industrial applications, such as safety monitoring on construction sites. However, traditional approaches rely heavily on arduous manual annotation and data collection, which struggle to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an automated end-to-end pipeline designed to streamline the entire workflow of an object detection application from data collection to model deployment. DART eliminates the need for human labeling and extensive data collection while excelling in diverse scenarios. It employs a subject-driven image generation module (DreamBooth with SDXL) for data diversification, followed by an annotation stage where open-vocabulary object detection (Grounding DINO) generates bounding box annotations for both generated and original images. These pseudo-labels are then reviewed by a large multimodal model (GPT-4o) to guarantee credibility before serving as ground truth to train real-time object detectors (YOLO). We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current implementation of DART significantly increases average precision (AP) from 0.064 to 0.832. Furthermore, we adopt a modular design for DART to ensure easy exchangeability and extensibility. This allows for a smooth transition to more advanced algorithms in the future, seamless integration of new object categories without manual labeling, and adaptability to customized environments without extra data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.

arxiv, photorealistic image, placeholder, (13 more...)

arXiv.org Artificial Intelligence

2407.09174

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Materials > Metals & Mining (1.00)
Machinery > Construction Machinery & Heavy Trucks (1.00)
Energy (1.00)
Construction & Engineering (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Semantic Augmentation in Images using Language

Yerramilli, Sahiti, Tamarapalli, Jayant Sravan, Kulkarni, Tanmay Girish, Francis, Jonathan, Nyberg, Eric

arXiv.org Artificial IntelligenceApr-2-2024

Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to train these diffusion models, we propose a technique to utilize generated images to augment existing datasets. This paper explores various strategies for effective data augmentation to improve the out-of-domain generalization capabilities of deep learning models.

augmentation, caption, dataset, (15 more...)

arXiv.org Artificial Intelligence

2404.02353

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

Bellagente, Marco, Brack, Manuel, Teufel, Hannah, Friedrich, Felix, Deiseroth, Björn, Eichenberg, Constantin, Dai, Andrew, Baldock, Robert, Nanda, Souradeep, Oostermeijer, Koen, Cruz-Salinas, Andres Felipe, Schramowski, Patrick, Kersting, Kristian, Weinbach, Samuel

arXiv.org Artificial IntelligenceDec-20-2023

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MutliFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.

diffusion model, proceedings, usion, (15 more...)

arXiv.org Artificial Intelligence

2305.15296

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > Texas (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Google's AI-powered search tool can help tackle your holiday shopping

EngadgetNov-16-2023, 12:00:28 GMT

Google is scaling up Search Generative Experience (SGE) for holiday shopping. The company announced Thursday that its AI-powered search bot can now spit out gift ideas, photorealistic images of product types and virtual try-ons of men's tops. Google SGE launched in May, offering AI-driven answers and suggestions to complement the search engine's standard web results. The company has since added follow-up queries, better translations and interactive definitions in more complex subjects. The tool requires Chrome on desktop or the Google mobile app on smartphones.

ai-powered search tool, google, photorealistic image, (7 more...)

Engadget

Industry:

Retail (0.62)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.62)
Consumer Products & Services (0.62)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.73)

Add feedback

AI imager Midjourney v5 stuns with photorealistic images--and 5-fingered hands

#artificialintelligenceMar-16-2023, 23:25:11 GMT

On Wednesday, Midjourney announced version 5 of its commercial AI image synthesis service, which can produce photorealistic images at a quality level that some AI art fans are calling creepy and "too perfect." Midjourney v5 is available now as an alpha test for customers who subscribe to the Midjourney service, which is available through Discord. "MJ v5 currently feels to me like finally getting glasses after ignoring bad eyesight for a little bit too long," said Julie Wieland, a graphic designer who often shares her Midjourney creations on Twitter. "Suddenly you see everything in 4k, it feels weirdly overwhelming but also amazing." Wieland shared some of her Midjourney v5 generations with Ars Technica (seen below in a gallery and in the main image above), and they certainly show a progression in image detail since Midjourney first arrived in March 2022.

5-fingered hand, photorealistic image, wieland, (5 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (1.00)

Add feedback

How to choose a Sentence Transformer from Hugging Face

#artificialintelligenceJan-12-2023, 20:22:10 GMT

As a quick recap, Domain largely describes the high-level notion of what the dataset is about. In addition to Domain, there are many Tasks used to produce vector embeddings. Unlike language models, in which most models use the training task of "predict the masked out token", embedding models are trained in a much broader set of ways. For example, Duplicate Question Detection might perform better with a different model than one trained with Question Answering. It is a good rule of thumb to find models that have been trained within the same domain as your use case.

artificial intelligence, machine learning, natural language, (18 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback